“Everything is related to everything else, but near things are more related than distant things.”
# Libraries
library(tidyverse)
library(spdep)
library(rgdal)
library(sf)
library(sp)
library(rgeos)
library(terra)
library(DT)
colours = colorRampPalette(c('#fedb71','#dd696d'))
The necessary data for this type of analysis are:
# Read municipality data
tn <- readOGR("../data/aggregated_data_per_municipality", encoding="UTF-8")
## OGR data source with driver: ESRI Shapefile
## Source: "G:\Il mio Drive\2nd Year\Geospatial\TrentinoSchools\data\aggregated_data_per_municipality", layer: "aggregated_data_per_municipality"
## with 166 features
## It has 16 fields
## Integer64 fields read as strings: Pop under Pop_mat Pop_ele Pop_med Pop_sup Popolazion
# Setting the CRS
tn <- spTransform(tn, CRS("+init=epsg:4326"))
# Function to replace NAs with 0
na.zero <- function (x) {
x[is.na(x)] <- 0
return(x)
}
names(tn@data)[3] = "Scuole"
names(tn@data)[7] = "Media_stud_classe"
names(tn@data)[8] = "Media_stud_scuola"
names(tn@data)[9] = "Pop_under_20"
names(tn@data)[15] = "Pop_under_20_Pop_tot"
names(tn@data)[16] = "Stud_Pop_under_20"
# Replace NAs about municipalities' number of schools with 0s
tn@data$Scuole = na.zero(tn@data$Scuole)
# Plot the municipalities in Trentino
par(mar=c(0,0,0,0))
plot(tn, axes = F, border="grey")
It is worth noticing from the map of school points that many of them overlap, especially in the Adige’s Valley and most populated areas, such as Trento, Rovereto and Riva del Garda.
# Import shapefile as SpatialPointsDataFrame
schools <- readOGR("../data/Trentino/schools/schools.shp",verbose = FALSE)
# Setting the CRS
schools <- spTransform(schools, CRS("+init=epsg:4326"))
# Plot schools over Trentino's map
par(mar=c(0,0,0,0))
plot(tn, border = "grey", axes = F)
points(schools@coords, col = "#f27059", cex = 1, pch = 1)
Before starting with the global analysis of Trentino municipalities, it is necessary to select some representative points for each municipality as unique reference to the spatial coordinate.
par(mar=c(0,0,0,0))
plot(tn, border="grey")
points(coordinates(tn),
col="red",
bg = "#EF798A",
pch = 21,
lwd = 1.5)
Commonly, the centroid is a good choice, but still some problems can emerge and the centroid may not fall inside the boundaries of the territory. In this particular case, this may happen with multipolygons shape, i.e. those municipalities represented by a multiple number of polygons that do not share a boundary. Some examples are the municipalities of Tione di Trento, Ronzone, Stenico, Calliano, Pellizzano and Riva del Garda. Some municipalities, as Luserna, find their centroid in another territory because of their shape.
The plot depicts all those municipalities whose centroid is outside their actual territory. As can be seen, most of them have little zones outside the main one, while others (e.g. Predaia and Luserna) just have awkward shapes.
tn_coords = coordinates(tn)
tn@data$centroid = tn_coords
colorize = c()
for(i in 1:166){
colorize = append(colorize, point.in.polygon(tn$centroid[i,1],
tn$centroid[i,2],
tn@polygons[i][[1]]@Polygons[[1]]@coords[,1],
tn@polygons[i][[1]]@Polygons[[1]]@coords[,2]))
}
par(mar = c(0,0,0,0))
plot(tn, col = ifelse(colorize, "white","lightgrey"), border="grey")
text(tn_coords, labels = ifelse(colorize, "" ,tn@data$Comune), cex=0.6)
Instead of recurring to coordinates(), we could use
gCentroid to obtain an alternative version of centroids.
For most of the cases, these two versions coincide, but for those
municipalities with multiple polygons points may differ (e.g. Soraga di
Fassa in the upper right part). Since it computes a sort of mean point,
making the centroid being part of other territories, the rest of the
notebook will relate to the previous version
(i.e. coordinates), despite inaccurate in some cases.
par(mar=c(0,0,0,0))
trueCentroids = gCentroid(tn,byid=TRUE)
plot(tn, border="grey")
points(coordinates(tn),pch=1, col="blue")
points(trueCentroids,pch=2, col="red")
⚠️ Note that, instead, school dataset contains points and not polygons, therefore no problem occurs with the centroid computation. Also, since many schools have same coordinates because in the same building, the centroids will only consider unique points, reducing the dataset from 724 to 599 unique points.
Since there are various definitions of neighbourhood, we will try to explore three of them in the following sections, starting from K-nearest neighbour.
Since there is no way to choose a specific value for \(k\), we can iterate over a customized range, let’s say from 1 to 20 neighbours (i.e. schools). The following code generates an image for each value of \(k\).
# Saving schools coordinates
school_coords = coordinates(schools)
# Save frame per frame
for(i in 1:20){
png(paste0("../viz/knn/",i,".png"),res=300, width=1000, height=1000)
k <- knn2nb(knearneigh(school_coords, k = i, longlat=T))
par(mar = c(0,0,0,0))
plot(tn, border = "grey80", axis = tn, lwd=0.5)
plot(k, school_coords, lwd=.6, col=alpha("#F27059",alpha=0.5),
cex = .3, add=TRUE, points=FALSE)
dev.off()
}
⚠️ Note that through the function saveGIF()4 it would
be possible to save frames as animation, lowering the resolution of the
image obtained. Saving each frame will also allow users to select a
customized value for \(k\) and look at
how the map changes inside the website.
KNN on Trentino Schools
As \(k\) is increased, each school finds more and more neighbours, approaching also further points. If we focus on the first frame with \(k=1\), we may notice that some schools are isolated, since their closest neighbour requires long lines to be reached. An example is the municipality of Vermiglio, in the upper left part of Trentino, in the western boundary. There are in fact \(3\) schools in Vermiglio and one of them is near the boundary, far from other schools. A similar situation happens to Rabbi and Malé, to Luserna and Lavarone, but still Vermiglio is the municipality with the longest distance between schools within the local territory.
k <- knn2nb(knearneigh(school_coords, k = 1, longlat=T))
par(mar = c(0,0,0,0))
plot(tn, border = "grey80", axis = tn, lwd=0.5)
plot(k, school_coords, lwd=.6,
col=alpha("#F27059",alpha=0.5),
cex = .3, add=TRUE, points=FALSE)
text(tn_coords, labels = ifelse(tn@data$Comune %in% c("Vermiglio","Rabbi","Malé","Luserna","Lavarone","Predaia"),tn@data$Comune, ""), cex=0.6)
On the other hand, at the extreme opposite, with \(k=20\), we obtain the network of schools in Trentino, looking at the \(20\) closest schools around each point. Trento and Rovereto are the most intertwined zones, while green areas as Adamello-Brenta Natural Park (west boundary) and Valsugana (empty zone in the right part of the map) lack in the number of schools. In fact, as can be seen by the length of lines, distances between schools tend to increase by moving from the Adige Valley to the east and west boundary.
KNN with k=20
If instead we focus on the municipalities, we can limit to a lower \(k\), since we talk about polygons with other territories at their boundary, while some schools points may result isolated with low values of \(k\). By looking at \(k=5\) plot, we may notice how all municipalities are linked to each other, despite it does not seem so in the territory above Borgo Valsugana, where Dolomites can be found.
knn1 = knn2nb(knearneigh(tn_coords, k = 1, longlat=T))
knn2 = knn2nb(knearneigh(tn_coords, k = 2, longlat=T))
knn3 = knn2nb(knearneigh(tn_coords, k = 3, longlat=T))
knn4 = knn2nb(knearneigh(tn_coords, k = 4, longlat=T))
knn5 = knn2nb(knearneigh(tn_coords, k = 5, longlat=T))
par(mar=c(0,0,0,0))
plot(tn, border = "grey80", axis = tn, lwd=0.5)
plot(knn5, tn_coords, lwd=.6, col=alpha("#F27059",alpha=0.5),
cex = .3, add=TRUE, points=FALSE)
Our aim in this section will be to find the minimum threshold distance which allows all regions/points to have at least one neighbour. By setting \(k=1\) in the k-nearest neighbour, we can first compute the nearest neighbour to each school and the relative distance and then get the maximum distance among them.
knn1<- knn2nb(knearneigh(school_coords,
k=1,
longlat=T))
all.linked <- max(unlist(nbdists(knn1,
school_coords,
longlat=T)))
all.linked
## [1] 9.182375
According to the results, all schools have a neighbour at at least 9.1823746 km. This implies that the cut-off distance has to be greater than it. However, notice from the following plot the distribution of school distances: the majority of them is near \(0\)km, following a long tail distribution. This may happen in big cities, as Trento and Rovereto, where there are a lot of schools and the minimum distance between them lowers.
distances = unlist(nbdists(knn1,school_coords,longlat=T))
ggplot()+
geom_histogram(aes(x=distances), fill='#F27059', bins=50)+
labs(title = "Distribution of distances between schools")+
theme_minimal()
knn1<- knn2nb(knearneigh(tn_coords,
k=1,
longlat=T))
all.linked <- max(unlist(nbdists(knn1,
tn_coords,
longlat=T)))
all.linked
## [1] 11.16466
We can repeat the same computation on municipalities centroids, discovering that every municipality has a neighbour at at least 11.1646609 km, slightly greater than the previous cut-off distance, which could mean that there are multiple schools in every municipality or that they are close enough to the boundary to be neighbour of other municipalities’ schools. Analyzing the distribution of distances between municipalities, we may notice that the majority of them distances from others from \(2\) to \(4\)km.
distances = unlist(nbdists(knn1,tn_coords,longlat=T))
ggplot()+
geom_histogram(aes(x=distances), fill='#F27059', bins=50)+
labs(title="Distribution of distances between schools")+
theme_minimal()
We can try different neighbourhood definitions for different values of the cut-off distance, starting from the minimum threshold found before (i.e. \(9.18\)km).
dnb10 <- dnearneigh(school_coords, 0, 10, longlat=TRUE); dnb10
## Neighbour list object:
## Number of regions: 724
## Number of nonzero links: 54326
## Percentage nonzero weights: 10.36408
## Average number of links: 75.03591
dnb15 <- dnearneigh(school_coords, 0, 15, longlat=TRUE); dnb15
## Neighbour list object:
## Number of regions: 724
## Number of nonzero links: 93048
## Percentage nonzero weights: 17.75129
## Average number of links: 128.5193
dnb20 <- dnearneigh(school_coords, 0, 20, longlat=TRUE); dnb20
## Neighbour list object:
## Number of regions: 724
## Number of nonzero links: 140056
## Percentage nonzero weights: 26.71927
## Average number of links: 193.4475
dnb25 <- dnearneigh(school_coords, 0, 25, longlat=TRUE); dnb25
## Neighbour list object:
## Number of regions: 724
## Number of nonzero links: 192010
## Percentage nonzero weights: 36.63083
## Average number of links: 265.2072
dnb30 <- dnearneigh(school_coords, 0, 30, longlat=TRUE); dnb30
## Neighbour list object:
## Number of regions: 724
## Number of nonzero links: 247346
## Percentage nonzero weights: 47.18759
## Average number of links: 341.6381
As the cut-off distance increases, the number of links grows rapidly. Based on the visualization, we could have stopped at 20, where nearly every school is connected to others.
plot_neighbour = function(model, coords, title){
par(mar=c(0,0,1,0))
plot(tn, border="grey",xlab="",ylab="",xlim=NULL)
title(main=title, cex.main=0.8)
plot(model, coords, add=TRUE, col="#F27059", pch=16, lwd = 1, points=FALSE)
}
par(mfrow = c(3,2))
plot_neighbour(dnb10, school_coords, "d nearest neighbours, d = 10")
plot_neighbour(dnb15, school_coords, "d nearest neighbours, d = 15")
plot_neighbour(dnb20, school_coords, "d nearest neighbours, d = 20")
plot_neighbour(dnb25, school_coords, "d nearest neighbours, d = 25")
plot_neighbour(dnb30, school_coords, "d nearest neighbours, d = 30")
The same approach could be applied to municipalities data, obviously creating a network based on the territories around a certain area. Remembering that the cut-off threshold in this case is above \(11\), we can start with \(12\).
dnb12 <- dnearneigh(tn_coords, 0, 12, longlat=TRUE); dnb12
## Neighbour list object:
## Number of regions: 166
## Number of nonzero links: 2000
## Percentage nonzero weights: 7.257947
## Average number of links: 12.04819
dnb16 <- dnearneigh(tn_coords, 0, 16, longlat=TRUE); dnb16
## Neighbour list object:
## Number of regions: 166
## Number of nonzero links: 3314
## Percentage nonzero weights: 12.02642
## Average number of links: 19.96386
dnb20 <- dnearneigh(tn_coords, 0, 20, longlat=TRUE); dnb20
## Neighbour list object:
## Number of regions: 166
## Number of nonzero links: 4866
## Percentage nonzero weights: 17.65859
## Average number of links: 29.31325
dnb24 <- dnearneigh(tn_coords, 0, 24, longlat=TRUE); dnb24
## Neighbour list object:
## Number of regions: 166
## Number of nonzero links: 6712
## Percentage nonzero weights: 24.35767
## Average number of links: 40.43373
dnb30 <- dnearneigh(tn_coords, 0, 30, longlat=TRUE); dnb30
## Neighbour list object:
## Number of regions: 166
## Number of nonzero links: 9652
## Percentage nonzero weights: 35.02685
## Average number of links: 58.14458
par(mfrow = c(3,2))
plot_neighbour(dnb12, tn_coords, "d nearest neighbours, d = 12")
plot_neighbour(dnb16, tn_coords, "d nearest neighbours, d = 16")
plot_neighbour(dnb20, tn_coords, "d nearest neighbours, d = 20")
plot_neighbour(dnb24, tn_coords, "d nearest neighbours, d = 24")
plot_neighbour(dnb30, tn_coords, "d nearest neighbours, d = 30")
Also in this case the number of connections grows rapidly, indicating how close the municipalities are between each other. Consider that the maximum distance within province of Trento between municipalities is around 120km (considering the centroids, therefore the actual distance is greater), but over the \(75\%\) of municipalities has an area below \(50km^2\), which allows them to be connected with brief distances.
# Quantiles of areas of municipalities within the Province of Trento
tn@data$area = round(area(tn)/ 1000000,3)
area = tn@data %>%
arrange(desc(area)) %>%
select(Comune, area)
quantile(area$area)
## 0% 25% 50% 75% 100%
## 1.65200 12.95100 25.57700 48.97075 199.55200
# Find max distance within the province centroids
library(geosphere)
## Warning: il pacchetto 'geosphere' è stato creato con R versione 4.1.2
diff = c()
for(i in 1:166 ){
for (j in 1:166){
diff = append(diff, distm(tn_coords[i,],tn_coords[j,], fun = distHaversine))
}
}
# Maximum distance between centroids within the province of Trento
max(diff)/1000
## [1] 120.7196
Since schools are represented as points, we will use municipalities data to connect territories with common boundary, i.e. multiple municipalities around. From the visualization it is worth noticing the spiderweb created around the municipalities on the inside of Trentino, while those more disconnected are placed on the border of the Province, especially the territories on the upper right part of the map (e.g. Canazei, San Giovanni di Fassa, Mazzin, Campitello di Fassa, Soraga di Fassa, Moena).
par(mar=c(0,0,0,0))
contnb_q <- poly2nb(tn, queen=T)
plot(tn, border="grey")
plot(contnb_q, tn_coords, add=TRUE, col="#EF798A")
points(coordinates(tn),
col="red",
bg = "#EF798A",
pch = 21,
lwd = 1.5)
⚠️ Note that there are 166 municipalities in the Province of Trento. By sorting them according to the shape area, we get that the biggest areas do not share at least one boundary, since they take place on the border or in mountainous zones; while Trento occupies a central position.
area%>%
DT::datatable()
After the definition of neighbourhoods, we can create spatial weights matrices, one for each neighbour list previously created.
# K-nearest neighbour
knn1.list = nb2listw(knn1)
knn2.list = nb2listw(knn2)
knn3.list = nb2listw(knn3)
knn4.list = nb2listw(knn4)
knn5.list = nb2listw(knn5)
# Critical cut-off
dnb12.list = nb2listw(dnb12,style="W")
dnb16.list = nb2listw(dnb16,style="W")
dnb20.list = nb2listw(dnb30,style="W")
dnb24.list = nb2listw(dnb24,style="W")
dnb30.list = nb2listw(dnb30,style="W")
# Contiguity based approach
contnb_q.list = nb2listw(contnb_q)
# List with weights lists and their name
weights = list(
list(knn1.list, "K-nearest neighbour (k=1)"),
list(knn2.list, "K-nearest neighbour (k=2)"),
list(knn3.list, "K-nearest neighbour (k=3)"),
list(knn4.list, "K-nearest neighbour (k=4)"),
list(knn5.list, "K-nearest neighbour (k=5)"),
list(dnb12.list, "Critical cut-off neighbourhood (d=12)"),
list(dnb16.list, "Critical cut-off neighbourhood (d=16)"),
list(dnb20.list, "Critical cut-off neighbourhood (d=20)"),
list(dnb24.list, "Critical cut-off neighbourhood (d=24)"),
list(dnb30.list, "Critical cut-off neighbourhood (d=30)"),
list(contnb_q.list, "Contiguity-based neighbourhoord")
)
In the section, we will focus on Moran’s I test of spatial autocorrelation of Trentino Schools, in particular on the number of schools and students that populate every municipality. Let’s start plotting the distribution of some features over the territory.
⚠️ Note that we do not hold data about students of all schools in Trentino, therefore some data might be missing. In these cases, the plots below will show the respective area in white.
cols = list(tn$Scuole,tn$Studenti, tn$Classi, tn$Media_stud_scuola, tn$Pop_under_20_Pop_tot, tn$Stud_Pop_under_20, tn$Media_stud_classe)
titles = c("Schools","Students","Classes","Mean of students per school","Population under 20 over Total Population", "Students over Population under 20", "Mean students per class")
colours <- c("#fedb71","#FCD471","#facd71","#f6bf70","#eea26f","#EA946F","#e6866e","#e2786e","#e0716e","#dd696d")
na.ignore = function(x){
x[is.na(x)] <- -1
return(x)
}
par(mfrow=c(4,2),mar = c(0,0,1.7,0))
for(i in 1:7){
c = na.ignore(unlist(cols[i]))
brks <- round(quantile(c, seq(0,1,0.1)), digits=3)
plot(tn, col=ifelse(c==-1,
"#ffffff",
colours[findInterval(c, brks, all.inside=TRUE)]),
main = titles[i], cex.main=2.5)
}
Now we can try to compute the Moran’s test based on all the previous definitions of neighborhood and with the previous features exposed, trying to find some spatial autocorrelation.
Neighbourhood = c()
Column = c()
Sd = c()
p_value = c()
Moran_I_statistic = c()
Mean = c()
Var = c()
Assumption = c()
# Iterate over columns
for(i in 1:length(cols)){
# Iterate over neighbourhood
for (w in weights) {
c = na.zero(unlist(cols[i]))
# Iterate over assumptions
for (rand in c(T,F)) {
Neighbourhood = append(Neighbourhood, w[[2]])
res = moran.test(c, w[[1]], randomisation = rand)
Column = append(Column, titles[i])
Sd = append(Sd, round(as.numeric(res[[1]]),4))
p_value = append(p_value, round(as.numeric(res[[2]]), 4))
Moran_I_statistic = append(Moran_I_statistic, round(as.numeric(res[[3]][1]),4))
Mean = append(Mean, round(as.numeric(res[[3]][2]),4))
Var = append(Var, round(as.numeric(res[[3]][3]),4))
if(rand) {
Assumption = append(Assumption, "Randomization")
}else{
Assumption = append(Assumption, "Normality")
}
}
Neighbourhood = append(Neighbourhood, w[[2]])
res = moran.mc(c, w[[1]], nsim=999)
Column = append(Column, titles[i])
Sd = append(Sd,round(as.numeric(res[[1]]),4))
p_value = append(p_value, round(res$p.value),4)
Moran_I_statistic = append(Moran_I_statistic, round(res$statistic,4))
Mean = append(Mean, "")
Var = append(Var, "")
Assumption = append(Assumption, res$method)
}
}
# create df with results and show them
res_df = data.frame(Column, Neighbourhood, Moran_I_statistic, p_value,
Sd, Mean, Var, Assumption)
res_df %>%
arrange(desc(abs(Moran_I_statistic)), p_value) %>%
DT::datatable()
By ordering results according to the absolute value of the Moran’s I statistics, we can see that the highest statistics obtained are around the \([0.1789, 0.1952]\) interval, got in all three assumptions. Albeit mean students per school seem to be the column with highest Moran’s statistics, the p-value is not below the threshold of \(0.05\) for the majority of its observations. In fact, the mean of students per school results significative with neighbourhoods knn (k=5) and critical cut-off with d=16.
On the other hand, the proportion of Population under 20 over the total population seems significative with every configuration, except for some knns neighbourhoods. However, Moran’s statistics is lower than the ones gathered considering the mean of students per school.
Both these two columns show a positive spatial autocorrelation, while negative ones are associated to the number of Students over the Population under 20, with low p-value and minimum value \(-0.1296\) for Moran’s I statistics.
res_df%>%
group_by(Column) %>%
summarise("Median Moran" = median(Moran_I_statistic), "Median p-value" = median(p_value),
"Mean Moran" = round(mean(Moran_I_statistic),4), "Mean p-value" = round(mean(p_value),4)) %>%
DT::datatable()
Considering an aggregated table with median and mean statistics and p-value, we can confirm that the columns with highest statistics are:
The low mean and median p-values for Students confirms the absence of spatial autocorrelation based on the number of students. Nearly the same happens to the number of Classes and Schools.
Let’s start by trying to model the mean students per class with all the remaining features we have explored. The summary shows that all predictors are significant, except for the population under 20 over total population.
LinearMean <- lm(Media_stud_classe ~ Stud_Pop_under_20+Scuole+Classi+Media_stud_scuola+Pop_under_20_Pop_tot+Studenti, tn)
summary(LinearMean)
##
## Call:
## lm(formula = Media_stud_classe ~ Stud_Pop_under_20 + Scuole +
## Classi + Media_stud_scuola + Pop_under_20_Pop_tot + Studenti,
## data = tn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.440 -1.101 0.160 1.049 4.434
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.258131 1.617090 5.107 1.20e-06 ***
## Stud_Pop_under_20 1.939540 0.578944 3.350 0.00107 **
## Scuole 0.590475 0.113650 5.196 8.13e-07 ***
## Classi -0.200131 0.044154 -4.533 1.35e-05 ***
## Media_stud_scuola 0.039774 0.003293 12.079 < 2e-16 ***
## Pop_under_20_Pop_tot 5.294348 8.533140 0.620 0.53610
## Studenti 0.006696 0.002175 3.079 0.00256 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.787 on 124 degrees of freedom
## (35 osservazioni eliminate a causa di valori mancanti)
## Multiple R-squared: 0.7054, Adjusted R-squared: 0.6912
## F-statistic: 49.49 on 6 and 124 DF, p-value: < 2.2e-16
With the step function it is possible to simplify this model considering only those features with high significance, excluding the features with no significance (i.e. Pop_under_20_Pop_tot).
# Searching for a simplified model where every feature has high significance
LinearMean = step(LinearMean)
## Start: AIC=158.97
## Media_stud_classe ~ Stud_Pop_under_20 + Scuole + Classi + Media_stud_scuola +
## Pop_under_20_Pop_tot + Studenti
##
## Df Sum of Sq RSS AIC
## - Pop_under_20_Pop_tot 1 1.23 397.40 157.38
## <none> 396.17 158.97
## - Studenti 1 30.29 426.45 166.62
## - Stud_Pop_under_20 1 35.86 432.03 168.32
## - Classi 1 65.64 461.80 177.05
## - Scuole 1 86.24 482.41 182.77
## - Media_stud_scuola 1 466.11 862.28 258.85
##
## Step: AIC=157.38
## Media_stud_classe ~ Stud_Pop_under_20 + Scuole + Classi + Media_stud_scuola +
## Studenti
##
## Df Sum of Sq RSS AIC
## <none> 397.40 157.38
## - Studenti 1 29.26 426.66 164.68
## - Stud_Pop_under_20 1 34.66 432.06 166.33
## - Classi 1 64.61 462.01 175.11
## - Scuole 1 89.16 486.56 181.89
## - Media_stud_scuola 1 591.27 988.66 274.77
summary(LinearMean)
##
## Call:
## lm(formula = Media_stud_classe ~ Stud_Pop_under_20 + Scuole +
## Classi + Media_stud_scuola + Studenti, data = tn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2233 -1.0388 0.1256 1.0652 4.3898
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.228110 0.412372 22.378 < 2e-16 ***
## Stud_Pop_under_20 1.884257 0.570637 3.302 0.00125 **
## Scuole 0.597451 0.112814 5.296 5.16e-07 ***
## Classi -0.197897 0.043899 -4.508 1.49e-05 ***
## Media_stud_scuola 0.040634 0.002980 13.637 < 2e-16 ***
## Studenti 0.006533 0.002154 3.034 0.00294 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.783 on 125 degrees of freedom
## (35 osservazioni eliminate a causa di valori mancanti)
## Multiple R-squared: 0.7045, Adjusted R-squared: 0.6927
## F-statistic: 59.6 on 5 and 125 DF, p-value: < 2.2e-16
The plot of the studentized residuals of the linear model can give us a hint about the presence of spatial dependence in the residuals. In fact, some similarities may be found in the east part, near the border and within the Adamello Brenta Park.
par(mar=c(0,0,1,0))
studres <- rstudent(LinearMean)
resdistr <- round(quantile(studres, seq(0,1,0.25)), digits=3)
colours_5 <- colours(5)
plot(tn, col=colours_5[findInterval(studres, resdistr, all.inside=TRUE)],
main = "Residuals quantiles in Trentino")
The command that allows to perform the Moran’s I test in the OLS
residuals is the function lm.morantest(). In the following
chunk, the test to the studentized residuals of the linear Solow model,
for different specifications of the spatial weights matrix. This method
will be applied to all neighbourhoods definition, except KNN with \(k\) lower than 4 and the contiguity
neighbourhood, since they return unusual results.
ols_res = data.frame(Neighbourhood = c(""),
Moran = c(""),
p_value = c(""))
# Moran test on residuals
for(i in 4:11){
t = lm.morantest(LinearMean,weights[[i]][[1]],resfun=rstudent)
ols_res = rbind(ols_res, c(weights[[i]][[2]],
t$estimate['Observed Moran I'],
t$p.value))
}
ols_res$Moran = as.numeric(ols_res$Moran)
ols_res$p_value = as.numeric(ols_res$p_value)
ols_res %>%
arrange(desc(abs(Moran))) %>%
filter(!is.na(Moran)) %>%
DT::datatable()
The obtained results provide a negative spatial autocorrelation, but the p-value is far from being below to the threshold to confirm this hypothesis. Therefore, no spatial autocorrelation is found in the residuals of this model.
By focusing instead on the population under 20 over the total population for each municipality, we obtain from the summary that only the mean students per school is highly significative, while the students over population under 20’s p-value is slightly above \(0.05\).
LinearPop <- lm(tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20+Scuole+Classi+Media_stud_scuola+Media_stud_classe+Studenti, tn)
summary(LinearPop)
##
## Call:
## lm(formula = tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20 + Scuole +
## Classi + Media_stud_scuola + Media_stud_classe + Studenti,
## data = tn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.059060 -0.011379 0.002452 0.010725 0.045638
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.778e-01 9.719e-03 18.296 <2e-16 ***
## Stud_Pop_under_20 -1.154e-02 6.268e-03 -1.842 0.0679 .
## Scuole 9.683e-04 1.315e-03 0.736 0.4629
## Classi 5.377e-04 4.986e-04 1.078 0.2829
## Media_stud_scuola 1.387e-04 4.950e-05 2.802 0.0059 **
## Media_stud_classe 5.846e-04 9.422e-04 0.620 0.5361
## Studenti -3.461e-05 2.351e-05 -1.472 0.1435
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01878 on 124 degrees of freedom
## (35 osservazioni eliminate a causa di valori mancanti)
## Multiple R-squared: 0.184, Adjusted R-squared: 0.1445
## F-statistic: 4.659 on 6 and 124 DF, p-value: 0.0002638
By applying the step function, the mean of students per class and the number of classes are erased. However, the adjusted r-squared is too low to guarantee the quality of this model.
# Searching for a simplified model where every feature has high significance
LinearPop = step(LinearPop)
## Start: AIC=-1034.61
## tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20 + Scuole + Classi +
## Media_stud_scuola + Media_stud_classe + Studenti
##
## Df Sum of Sq RSS AIC
## - Media_stud_classe 1 0.00013579 0.043877 -1036.2
## - Scuole 1 0.00019128 0.043933 -1036.0
## - Classi 1 0.00041026 0.044152 -1035.4
## <none> 0.043742 -1034.6
## - Studenti 1 0.00076467 0.044506 -1034.3
## - Stud_Pop_under_20 1 0.00119657 0.044938 -1033.1
## - Media_stud_scuola 1 0.00276903 0.046511 -1028.6
##
## Step: AIC=-1036.2
## tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20 + Scuole + Classi +
## Media_stud_scuola + Studenti
##
## Df Sum of Sq RSS AIC
## - Classi 1 0.0002938 0.044171 -1037.3
## - Scuole 1 0.0004336 0.044311 -1036.9
## - Studenti 1 0.0006498 0.044527 -1036.3
## <none> 0.043877 -1036.2
## - Stud_Pop_under_20 1 0.0010645 0.044942 -1035.1
## - Media_stud_scuola 1 0.0094505 0.053328 -1012.6
##
## Step: AIC=-1037.33
## tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20 + Scuole + Media_stud_scuola +
## Studenti
##
## Df Sum of Sq RSS AIC
## <none> 0.044171 -1037.3
## - Scuole 1 0.0007497 0.044921 -1037.1
## - Stud_Pop_under_20 1 0.0007708 0.044942 -1037.1
## - Studenti 1 0.0009575 0.045129 -1036.5
## - Media_stud_scuola 1 0.0092138 0.053385 -1014.5
summary(LinearPop)
##
## Call:
## lm(formula = tn$Pop_under_20_Pop_tot ~ Stud_Pop_under_20 + Scuole +
## Media_stud_scuola + Studenti, data = tn)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.059266 -0.011806 0.002099 0.010582 0.046054
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.826e-01 4.277e-03 42.690 < 2e-16 ***
## Stud_Pop_under_20 -7.607e-03 5.130e-03 -1.483 0.141
## Scuole 1.649e-03 1.128e-03 1.462 0.146
## Media_stud_scuola 1.596e-04 3.113e-05 5.127 1.08e-06 ***
## Studenti -1.100e-05 6.658e-06 -1.653 0.101
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.01872 on 126 degrees of freedom
## (35 osservazioni eliminate a causa di valori mancanti)
## Multiple R-squared: 0.1759, Adjusted R-squared: 0.1498
## F-statistic: 6.725 on 4 and 126 DF, p-value: 6.134e-05
ols_res = data.frame(Neighbourhood = c(""),
Moran = c(""),
p_value = c(""))
# Moran test on residuals
for(i in 4:11){
t = lm.morantest(LinearPop,weights[[i]][[1]],resfun=rstudent)
ols_res = rbind(ols_res, c(weights[[i]][[2]],
t$estimate['Observed Moran I'],
t$p.value))
}
ols_res$Moran = as.numeric(ols_res$Moran)
ols_res$p_value = as.numeric(ols_res$p_value)
ols_res %>%
arrange(desc(abs(Moran))) %>%
filter(!is.na(Moran)) %>%
DT::datatable()
In this case instead, we can pay attention to the p-value, which is above \(0.05\) in all cases, except for knn with \(k=5\). This may indicate the presence of spatial autocorrelation in the residuals and consequently that the model is misspecified. Equivalent results have been obtained by rotating and discarding some predictors.
If spatial autocorrelation is present it will violate the assumption about the independence of residuals and call into question the validity of hypothesis testing6, leading to the rejection of the Population under 20 over the total population. However, since the spatial autocorrelation is found only in a single neighbourhood, it will still be studied in the next sections.
By looking at the plot we can see how the residuals in the most populated areas, as Trento and Rovereto, are low (i.e. yellow), while smaller municipalities around are red (i.e. high residuals).
par(mar=c(0,0,1,0))
studres <- rstudent(LinearPop)
resdistr <- round(quantile(studres, seq(0,1,0.25)), digits=3)
plot(tn, col=colours_5[findInterval(studres, resdistr, all.inside=TRUE)],
main = "Residuals quantiles in Trentino")
The following grid of images shows the Moran’s Scatterplot within all different notions of neighborhoods. The membership of municipalities inside a quadrant or another changes according to the neighbourhood definition. For instance, Novaledo often switches from HH to HL quadrant and Valfloriana from LH to LL and viceversa. Also, since this feature is a proportion that goes from \(0.10\) to \(0.24\), many municipalities overlap or assume the same value (columns of dots). Nevertheless, some outliers are noticeable, represented with a different style and their label.
As stated forehead, the quadrants with the pink line and its confidence interval shows the municipalities with similar proportion of students population over the total one (i.e. positive spatial autocorrelation). An example is the relationship between Novaledo and Vignola-Falesina, which are usually in the same HH quadrant, have a high proportion of students and they are close, geographically talking. On the opposite side, Cinte Tesino and Castello Tesino show a low value for students over the population.
On the other hand, municipalities on the remaining quadrants, have dissimilar values if they belong to opposite quadrants. For example, Fierozzo and Frassilongo share a boundary, but their proportion of students is \(0.22\) and \(0.15\) respectively, enhancing a big dissimilarity between these two municipalities.
This chunk shows how many municipalities belong to a specific quadrant, based on the different concepts of neighbourhood. It seems like 15 or less over 166 municipalities have a spatial autocorrelation and belong to a specific quadrant. These hotspots are sequently visualized with a colour, whose association is related to the membership to a specific quadrant.
Since the membership to quadrants is divided over the three different concepts of neighbourhood, we can observe that the critical cut-off neighbourhood has no hotspot in the HH quadrant, which represents the majority of municipalities for the contiguity approach.
## quadrants.knn
## HH HL LH LL None
## 1 3 2 5 155
## quadrants.dnb
## HL LH LL None
## 4 3 7 152
## quadrants.cont
## HH HL LH LL None
## 6 2 3 4 151
In order to better visualize them, the plot is served, showing in white the municipalities with no quadrant and the one inside the Moran’s quadrants with a gradient. Moreover, three of them are generated, one for each notion of neighbourhood, choosing a random \(k\) or \(d\) for knn and critical cut-off. This choice is due to the fact that different neighbourhoods lead to different hotspots, i.e. the points that emerge over others in the Moran’s scatterplot.
⚠️ It is suggested to open the image in a new window to better read municipalities labels, whose size has been reduced for readability issues.
par(mfrow=c(3,1))
par(mar=c(0,0,0,0))
quadrants = list(list(quadrants.knn,"KNN"), list(quadrants.dnb,"Cut-off"), list(quadrants.cont,"Contiguity"))
for(l in quadrants){
colourization = unlist(color_mapping[l[[1]]])
par(mar=c(0,0,1,0))
plot(tn, col=colourization, border = "grey", main=paste0("Regions with influence on students over population under 20 (neighbourhood = ",l[[2]],")"))
legend(x=11.38, y=45.95, legend=c("Low-Low", "Low-High", "High-Low", "High-High","None"),
fill=unlist(color_mapping), bty="n", cex=0.8)
text(tn_coords, labels = ifelse(l[[1]]=="None", "" ,tn@data$Comune), cex=0.7)
}
As can be noticed, while critical cut-off and knn focus on the same
eastern area of Trentino, the contiguity approach selects municipalities
all over the province, in particular Trento and Rovereto, the most
populated ones, and the least ones in the border.
While knn and contiguity show Novaledo as HH, KNN and Cut-off also show Frassilongo, Valfloriana, Castello Tesino and Cinte Tesino (and others not always highlighted before), described above as recurring ones in the not HH quadrants.
# Municipalities in common between KNN and cut-off
tn$Comune[quadrants.knn != "None" & quadrants.dnb != "None"]
## [1] "Castello Tesino" "Cinte Tesino" "Imer" "Mezzano"
## [5] "Palù Del Fersina" "Ruffrè-Mendola" "Terragnolo" "Valfloriana"
# Novaledo in common in contiguity and knn
tn$Comune[quadrants.cont != "None" & quadrants.knn != "None"]
## [1] "Novaledo"
# Nothing in common between contiguity and cut-off
tn$Comune[quadrants.cont != "None" & quadrants.dnb != "None"]
## character(0)
Local statistics can be tested for deviations using the hypothesis of
absence of local spatial autocorrelation and hence can provide the
statistical significance of the local spatial patterns detected by the
Moran scatterplot. In particular, the function localmoran()
provides:
By sorting results according to the municipality with highest absolute value of the local Moran statistic \(I_i\) and by filtering those whose p-value is below \(0.05\), we obtain the areas with highest statistical significance of the local spatial patterns, in a clearer way than the Moran scatterplot. Specifically, Sagron Mis, Cinte Tesino and Castello Tesino seem to be the municipalities with highest Local Moran’s, but comparing the top municipalities in lmI with those in the quadrants computed through the scatterplot, we may notice that only 41% of municipalities identified through the plot are spatially significant in the Local Moran’s.
lmI <- localmoran(tn$Pop_under_20_Pop_tot, dnb24.list,
na.action = na.exclude)
lmI = data.frame(lmI)
rownames(lmI) = tn$Comune
lmI_sign = lmI%>%
filter(lmI$Pr.z....E.Ii..<0.05) %>%
arrange(desc(abs(Ii)))
lmI_sign
## Ii E.Ii Var.Ii
## Sagron Mis 1.57033462 -0.0238863438 0.5326885407
## Cinte Tesino 1.24845968 -0.0497550803 0.2954593485
## Castello Tesino 1.13353122 -0.0497550803 0.2068618156
## Terragnolo -0.75066355 -0.0238863438 0.0876576080
## Folgaria -0.73194582 -0.0238863438 0.0737503913
## Pieve Tesino 0.66809695 -0.0144718200 0.0556222840
## Besenello 0.57752990 -0.0142739171 0.0337151866
## Calliano 0.49829283 -0.0142739171 0.0391648397
## Garniga Terme -0.43182077 -0.0074038592 0.0130176630
## Villa Lagarina 0.42370263 -0.0142739171 0.0404068536
## Tesero -0.41250827 -0.0025976459 0.0146859689
## Imer -0.39954581 -0.0025976459 0.0244219796
## Ziano Di Fiemme -0.37794170 -0.0025976459 0.0118012250
## Scurelle -0.35235327 -0.0072625000 0.0220710321
## Panchià 0.34818046 -0.0026824614 0.0117050773
## Cavedine -0.31211344 -0.0074038592 0.0161647903
## Cimone 0.28662508 -0.0072625000 0.0142044128
## Predazzo 0.26963547 -0.0026824614 0.0176012917
## Canal San Bovo 0.26247100 -0.0026824614 0.0144768020
## Riva Del Garda 0.25257812 -0.0025976459 0.0093972718
## Pomarolo 0.22822567 -0.0025976459 0.0067842791
## Madruzzo 0.22355566 -0.0025976459 0.0047116002
## Tenno 0.20378836 -0.0025976459 0.0079314466
## Vallelaghi 0.18627981 -0.0025976459 0.0042459434
## Nogaredo 0.17568546 -0.0025976459 0.0079314466
## Primiero San Martino Di Castrozza 0.17276227 -0.0003076266 0.0036396092
## Mezzano -0.14283306 -0.0002793548 0.0026324811
## Cavalese -0.10504774 -0.0002793548 0.0010499637
## Bieno -0.08170975 -0.0002793548 0.0010891592
## Volano 0.07758130 -0.0002793548 0.0008278560
## Nomi 0.07758130 -0.0002793548 0.0006890387
## Aldeno 0.06802252 -0.0002793548 0.0005502214
## Castelnuovo 0.06641020 -0.0003076266 0.0010403384
## Castel Ivano 0.06284719 -0.0003076266 0.0009414393
## Trento -0.06230966 -0.0003076266 0.0004912425
## Fiavè 0.05927446 -0.0002793548 0.0008020295
## Calceranica Al Lago 0.04781105 -0.0002793548 0.0005653651
## Z.Ii Pr.z....E.Ii..
## Sagron Mis 2.184298 0.0289403760
## Cinte Tesino 2.388348 0.0169242900
## Castello Tesino 2.601655 0.0092775185
## Terragnolo -2.454746 0.0140984325
## Folgaria -2.607279 0.0091264854
## Pieve Tesino 2.894156 0.0038017867
## Besenello 3.223035 0.0012684020
## Calliano 2.590015 0.0095971794
## Garniga Terme -3.719856 0.0001993361
## Villa Lagarina 2.178830 0.0293443031
## Tesero -3.382501 0.0007182914
## Imer -2.540056 0.0110834586
## Ziano Di Fiemme -3.455143 0.0005500015
## Scurelle -2.322855 0.0201869651
## Panchià 3.243025 0.0011826773
## Cavedine -2.396630 0.0165466044
## Cimone 2.465867 0.0136682213
## Predazzo 2.052598 0.0401115535
## Canal San Bovo 2.203743 0.0275424098
## Riva Del Garda 2.632319 0.0084804174
## Pomarolo 2.802385 0.0050726283
## Madruzzo 3.294722 0.0009851925
## Tenno 2.317416 0.0204810691
## Vallelaghi 2.898632 0.0037479503
## Nogaredo 2.001861 0.0452996502
## Primiero San Martino Di Castrozza 2.868760 0.0041208491
## Mezzano -2.778407 0.0054626149
## Cavalese -3.233278 0.0012237826
## Bieno -2.467407 0.0136095455
## Volano 2.706080 0.0068082610
## Nomi 2.966171 0.0030153231
## Aldeno 2.911816 0.0035933470
## Castelnuovo 2.068496 0.0385934480
## Castel Ivano 2.058308 0.0395606052
## Trento -2.797422 0.0051512242
## Fiavè 2.102880 0.0354762928
## Calceranica Al Lago 2.022525 0.0431221200
# Proportion of significant municipalities in Moran's Scatterplot
# versus those identified through local Moran's
sum(c(tn$Comune[quadrants.knn != "None" |
quadrants.dnb != "None" |
quadrants.cont != "None"]) %in% rownames(lmI_sign))/dim(lmI_sign)[1]
## [1] 0.4054054
The following two plots show:
par(mar=c(0,0,1,0))
brks <- sort(as.numeric(lmI[,1]))
colours <- colorRampPalette(c('#fedb71','#dd696d'))(length(lmI[,1]))
plot(tn, col=colours[findInterval(lmI[,1], brks, all.inside=TRUE)],
border="grey30")
title(main="Local Moran's I values")
text(tn_coords, labels = ifelse(tn@data$Comune %in% rownames(lmI_sign), tn@data$Comune ,""), cex=0.7)
# Mapping the p-value as color
pvalue_colors = c("white","#fedb71", "#F6BF70", "#E6866E","#DD696D")
pval <- as.numeric(lmI[,5])
colpval = ifelse(pval>0.05, pvalue_colors[1],
ifelse(pval>0.01 & pval<=0.05, pvalue_colors[2],
ifelse(pval>0.001 & pval<=0.01,pvalue_colors[3],
ifelse(pval>0.0001,pvalue_colors[4],pvalue_colors[5]))))
tn$colpval[pval>0.05] <- "white"
tn$colpval[pval<=0.05 & pval>0.01] <- gray(0.9)
tn$colpval[pval<=0.01 & pval>0.001] <- gray(0.7)
tn$colpval[pval<=0.001 & pval>0.0001] <- gray(0.4)
tn$colpval[pval<=0.0001] <- "black"
plot(tn, col=colpval)
legend(x=11.5, y=45.9, legend=c("Not significant",
"p-value = 0.05", "p-value = 0.01", "p-value = 0.001",
"p-value = 0.0001"), fill=pvalue_colors, bty="n", cex=1)
title(main="Local Moran's I significance map")
text(tn_coords, labels = ifelse(colpval == "white","", tn@data$Comune), cex=0.7)
Comparing the results obtained with the Scatterplots’, we can state that
while the critical cut-off found significance in the eastern part
(e.g. Primiero, Predazzo, Castello Tesino etc), the contiguity approach
helped to find significance near Riva del Garda and around Trento.
The following grid of images shows the Moran’s Scatterplot within all different notions of neighborhoods, considering the mean of students per school. As in the previous case, the pink line has a positive slope, whose confidence interval continues to grow by increasing the \(x\) value. Still, some differences can be noticed from a neighbourhood to another and some municipalities, like Mezzocorona, Villa Lagarina and San Michele All’Adige, continues to switch from above to below the mean dashed line.
In HH quadrant, Trento, Rovereto, Mori, Mezzocorona, Avio, Lavis and Ala seem to be the municipalities with more students per school in the majority of plots, but also Brentonico, Giovo and Nago-Torbole, except when considering the critical cut-off neighbourhood.
On the LH part, communities like Drena (surrounded by Arco, Dro and Cavedine), Fornace (Baselga di Pinè and Civezzano), Ronzo-Chienis, Vallarsa (Ala, Rovereto) and Terragnolo (Rovereto, Folgaria) are those municipalities that have few students (less than \(100\) per school), but surrounded by municipalities with a lot of them.
This chunk shows how many municipalities belong to a specific quadrant, based on the different concepts of neighbourhood. It seems like 16 or less over 166 municipalities have a spatial autocorrelation and belong to a specific quadrant. In particular, both knn and cut-off neighbourhoods do not provide hotspots for the LL quadrant, while it exceeds with HH municipalities, compared to contiguity neighbourhood.
table(quadrants.knn)
## quadrants.knn
## HH HL LH None
## 12 1 2 151
table(quadrants.cont)
## quadrants.cont
## HH HL LH LL None
## 3 2 6 5 150
table(quadrants.dnb)
## quadrants.dnb
## HH HL LH None
## 6 2 3 155
The map displays the membership of municipalities to each of the quadrant in the Moran’s scatterplot, while in white leaving the remaining territories with no quadrant.
Starting with KNN version, the Adige valley is completely in red, showing the spatial similarity of high number of students per school: from Mezzocorona, to Trento, continuing to the neighbourhood of Rovereto and the bottom-center part of Trentino.
The cut-off neighbourhood instead considers partially the same territories of KNN, with less focus on Rovereto and more on some LH municipalities, as Vallarsa, Trambileno and Terragnolo.
In the end, the contiguity approach detaches itself from previous approaches to show apparently random municipalities, some in common with knn and cut-off (e.g. Roverè della Luna, Lavis, Trambileno, Nago-Torbole and Mori).
par(mfrow=c(3,1))
par(mar=c(0,0,0,0))
quadrants = list(list(quadrants.knn,"KNN"), list(quadrants.dnb,"Cut-off"), list(quadrants.cont,"Contiguity"))
for(l in quadrants){
colourization = unlist(color_mapping[l[[1]]])
par(mar=c(0,0,1,0))
plot(tn, col=colourization, border = "grey", main=paste0("Regions with influence on mean number of students per school (neighbourhood = ",l[[2]],")"))
legend(x=11.38, y=45.95, legend=c("Low-Low", "Low-High", "High-Low", "High-High","None"),
fill=unlist(color_mapping), bty="n", cex=0.8)
text(tn_coords, labels = ifelse(l[[1]]=="None", "" ,tn@data$Comune), cex=0.7)
}
# Municipalities in common between KNN and cut-off
tn$Comune[quadrants.knn != "None" & quadrants.dnb != "None"]
## [1] "Avio" "Lavis" "Levico Terme"
## [4] "Mezzocorona" "Mori" "San Michele All'Adige"
## [7] "Trento" "Villa Lagarina"
# Novaledo in common in contiguity and knn
tn$Comune[quadrants.cont != "None" & quadrants.knn != "None"]
## [1] "Lavis" "Mori" "Nago-Torbole"
## [4] "Roverè Della Luna"
# Nothing in common between contiguity and cut-off
tn$Comune[quadrants.cont != "None" & quadrants.dnb != "None"]
## [1] "Lavis" "Mori" "Trambileno"
This time, we will focus on the mean of students per school, whose values comprehend also NAs
lmI <- localmoran(tn$Media_stud_scuola, dnb24.list,
na.action = na.exclude)
lmI = data.frame(lmI)
rownames(lmI) = tn$Comune
lmI_sign = lmI%>%
filter(lmI$Pr.z....E.Ii..<0.05) %>%
arrange(desc(abs(Ii)))
lmI_sign
## Ii E.Ii Var.Ii Z.Ii Pr.z....E.Ii..
## Mori 1.14769596 -1.274208e-01 3.457825e-01 2.168445 0.0301248235
## Rovereto 0.95675307 -3.235236e-02 9.344641e-02 3.235648 0.0012136693
## Ala 0.88594399 -3.205998e-02 1.960825e-01 2.073123 0.0381608025
## Terragnolo -0.80223050 -1.453669e-02 4.645805e-02 -3.654491 0.0002576923
## Villa Lagarina 0.73378003 -3.710805e-02 8.164128e-02 2.697966 0.0069764617
## Riva Del Garda 0.63274583 -1.853564e-02 5.657698e-02 2.738097 0.0061795852
## Arco 0.54009855 -1.745431e-02 4.063626e-02 2.765853 0.0056774037
## Trambileno -0.49254292 -7.064617e-03 2.480930e-02 -3.082212 0.0020546836
## Nomi -0.44465445 -6.151782e-03 1.256184e-02 -3.912421 0.0000913755
## Vallarsa -0.42027295 -6.600309e-03 3.097603e-02 -2.350411 0.0187526702
## Comano Terme 0.41901158 -1.481270e-02 2.896546e-02 2.549022 0.0108025338
## Castello Tesino 0.40983270 -8.294430e-03 3.886036e-02 2.121070 0.0339159370
## Dro 0.40079097 -1.627262e-02 3.070579e-02 2.380083 0.0173087268
## Brentonico 0.38962567 -5.162108e-03 1.899779e-02 2.864258 0.0041798754
## Isera 0.32999486 -5.162108e-03 1.262598e-02 2.982742 0.0028567906
## Volano 0.28532162 -2.301253e-03 5.644814e-03 3.828235 0.0001290653
## Tesero -0.27678277 -2.267688e-03 1.263693e-02 -2.441998 0.0146062139
## Pomarolo 0.27627134 -2.578646e-03 5.472494e-03 3.769450 0.0001636078
## Cimone -0.26761447 -5.933436e-03 9.583481e-03 -2.673072 0.0075160175
## Pieve Tesino 0.25582979 -2.259517e-03 9.157454e-03 2.697011 0.0069965046
## Telve Di Sopra 0.25234970 -5.099504e-03 1.118395e-02 2.434411 0.0149160511
## Ronzo-Chienis -0.24180042 -2.395013e-03 6.098588e-03 -3.065627 0.0021721415
## Besenello 0.22446379 -3.024324e-03 5.984671e-03 2.940618 0.0032755862
## Fiavè -0.22348486 -3.453953e-03 9.126847e-03 -2.303156 0.0212700840
## Folgaria -0.21443171 -1.584808e-03 4.361390e-03 -3.222960 0.0012687340
## Nago-Torbole 0.21043758 -1.793810e-03 5.806999e-03 2.785055 0.0053518671
## Avio 0.19700373 -4.940377e-04 4.154866e-03 3.063964 0.0021842500
## Telve -0.19179159 -2.578646e-03 5.876713e-03 -2.468219 0.0135787389
## Commezzadura 0.18440533 -2.127966e-03 6.886436e-03 2.247807 0.0245884792
## Nogaredo -0.16622864 -1.121541e-03 2.654520e-03 -3.204592 0.0013525409
## Borgo Valsugana -0.16422476 -1.904932e-03 5.451617e-03 -2.198411 0.0279198154
## Scurelle 0.15276409 -1.029442e-03 3.198248e-03 2.719456 0.0065389455
## Aldeno 0.13042554 -1.914753e-03 3.006207e-03 2.413697 0.0157916059
## Tenno 0.10671555 -5.148562e-04 1.475490e-03 2.791579 0.0052451594
## Madruzzo -0.09417449 -5.934186e-04 8.749834e-04 -3.163648 0.0015580534
## Ledro -0.08626386 -2.538671e-04 1.199040e-03 -2.483888 0.0129956495
## Ziano Di Fiemme 0.07806969 -2.767384e-04 1.458270e-03 2.051636 0.0402050957
## Cavedine 0.06832860 -4.236886e-04 7.594944e-04 2.494738 0.0126050272
## Castel Ivano -0.04601451 -1.420377e-04 4.605713e-04 -2.137488 0.0325583285
## Cavalese 0.03035200 -3.084668e-05 1.141087e-04 2.844259 0.0044514900
## Calliano 0.01691412 -9.165718e-06 2.171805e-05 3.631402 0.0002818859
# Proportion of significant municipalities in Moran's Scatterplot
# versus those identified through local Moran's
sum(c(tn$Comune[quadrants.knn != "None" |
quadrants.dnb != "None" |
quadrants.cont != "None"]) %in% rownames(lmI_sign))/dim(lmI_sign)[1]
## [1] 0.2926829
The following two plots show:
par(mar=c(0,0,1,0))
brks <- sort(as.numeric(lmI[,1]))
colours <- colorRampPalette(c('#fedb71','#dd696d'))(length(lmI[,1]))
plot(tn, col=colours[findInterval(lmI[,1], brks, all.inside=TRUE)],
border="grey30")
title(main="Local Moran's I values")
text(tn_coords, labels = ifelse(tn@data$Comune %in% rownames(lmI_sign), tn@data$Comune ,""), cex=0.7)
# Mapping the p-value as color
pvalue_colors = c("white","#fedb71", "#F6BF70", "#E6866E","#DD696D")
pval <- as.numeric(lmI[,5])
colpval = ifelse(pval>0.05, pvalue_colors[1],
ifelse(pval>0.01 & pval<=0.05, pvalue_colors[2],
ifelse(pval>0.001 & pval<=0.01,pvalue_colors[3],
ifelse(pval>0.0001,pvalue_colors[4],pvalue_colors[5]))))
tn$colpval[pval>0.05] <- "white"
tn$colpval[pval<=0.05 & pval>0.01] <- gray(0.9)
tn$colpval[pval<=0.01 & pval>0.001] <- gray(0.7)
tn$colpval[pval<=0.001 & pval>0.0001] <- gray(0.4)
tn$colpval[pval<=0.0001] <- "black"
plot(tn, col=colpval)
legend(x=11.5, y=45.9, legend=c("Not significant",
"p-value = 0.05", "p-value = 0.01", "p-value = 0.001",
"p-value = 0.0001"), fill=pvalue_colors, bty="n", cex=1)
title(main="Local Moran's I significance map")
text(tn_coords, labels = ifelse(colpval == "white","", tn@data$Comune), cex=0.7)
This last inquiry is due to the great gap discovered in the choropleth map inside the website that shows the proportion of actual students over the total population under 20 years old of every municipality. It seems in fact that some municipalities host more students than those who actually live in that area, leading to the possible conclusion that students may need to move from their city to another to go to school every day.
The remaining grid shows this time a negative trend of the relationship between x and the spatially lagged x, indicating that there are few close municipalities with a high number of students over the young population, but mostly of them share dissimilar values with their neighbours. In fact, this may be due to the problem highlighted before, related to the movement of students across different municipalities to go to school, especially middle, high and professional schools.
By assigning a quadrant to each municipality we can easily access to the
number and the names of municipalities where students come from other
areas or that have the necessity of moving to go to school. In fact, as
shown in the assignation of numbers to quadrants, the most populated
areas are HL and LH, showing a trend of negative spatial
autocorrelation.
table(quadrants.knn)
## quadrants.knn
## HL LH LL None
## 6 4 2 154
table(quadrants.cont)
## quadrants.cont
## HH HL LH None
## 4 7 2 153
table(quadrants.dnb)
## quadrants.dnb
## HH HL LH None
## 2 5 2 157
In this case the color mapping has been changed to highlight the quadrants with most of the municipalities, whose values are dissimilar (i.e. HL in red, LH in yellow).
As in the previous two analysis of the scatterplot, knn and cut-off show more similarities than with the contiguity approach, thus pointing out the municipalities that host more students than those who actually live inside the territory (i.e. they welcome students from their neighbours).
The most critical situation is around Tione di Trento, which has a high number of students over its under 20 population, while Valdaone, Selle Giudicarie, Porte di Rendena and Pelugo lack of students. Notice however that we do not hold data about the number of students in Valdaone and most of them are missing in those zones.
Something similar but limited happens between Canazei and Giovanni di Fassa, since their closeness and the gap of students in the first against the aboundance in the second may imply that students in Canazei move to Giovanni di Fassa to go to school.
color_mapping = list("LL" = "#F6Bf70",
"LH" = "#FEDB71",
"HL" = "#E0716E",
"HH" = "#E6866E",
"None" = "white")
par(mfrow=c(3,1))
par(mar=c(0,0,0,0))
quadrants = list(list(quadrants.knn,"KNN"),
list(quadrants.dnb,"Cut-off"),
list(quadrants.cont,"Contiguity"))
for(l in quadrants){
colourization = unlist(color_mapping[l[[1]]])
par(mar=c(0,0,1,0))
plot(tn, col=colourization, border = "grey",
main=paste0("Regions with influence on students over population under 20 (neighbourhood = ",l[[2]],")"))
legend(x=11.38, y=45.95,
legend=c("Low-Low", "Low-High", "High-Low", "High-High","None"),
fill=unlist(color_mapping), bty="n", cex=0.8)
text(tn_coords, labels = ifelse(l[[1]]=="None", "" ,tn@data$Comune), cex=0.7)
}
# Municipalities in common between KNN and cut-off
# (those that for sure host students from their neighbours)
tn$Comune[quadrants.knn != "None" & quadrants.dnb != "None"]
## [1] "Bondone" "Cles" "Ossana" "Tione Di Trento"
# Nothing in common in contiguity and knn
tn$Comune[quadrants.cont != "None" & quadrants.knn != "None"]
## [1] "Cinte Tesino"
# Bondone in common between contiguity and cut-off
tn$Comune[quadrants.cont != "None" & quadrants.dnb != "None"]
## character(0)
In this section we will resume the conclusions obtained in the Moran’s I test of spatial autocorrelation in OLS residuals, in particular when the hypothesis of no spatial autocorrelation in the residuals is violated, which happened with the Population under 20 over the total population model.
First of all, we can estimate the Spatial Durbin Model (SDM)
Tobler, Waldo R. 1970. “A Computer Movie Simulating Urban Growth in the Detroit Region.” Economic Geography 46 (2): 234–40. http://www.geog.ucsb.edu/~tobler/publications/pdf_docs/A-Computer-Movie.pdf.↩︎
Wikipedia page of Centroid, https://en.wikipedia.org/wiki/Centroid↩︎
How to find the centre of a polygon i Python, https://deparkes.co.uk/2015/02/28/how-to-find-the-centre-of-a-polygon-in-python/↩︎
saveGIF() documentation https://www.rdocumentation.org/packages/animation/versions/2.4.1/topics/saveGIF↩︎
Spatial regression models documentation, https://rspatial.org/raster/analysis/7-spregression.html↩︎
Spatial Autocorrelation, https://ibis.geog.ubc.ca/courses/geob479/notes/spatial_analysis/spatial_autocorrelation.htm#↩︎